Ebola Outbreak 2014

Data retrieval

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
In [2]:
df = pd.DataFrame.from_csv('https://raw.githubusercontent.com/cmrivers/ebola/master/country_timeseries.csv',
                 index_col=0)
df = df.sort_index()
df = df.fillna(method='bfill')

Cases per country

In [3]:
cases_titles = [k for k in df.columns if 'deaths' in k.lower()]
df.plot(y=cases_titles)
Out[3]:
<matplotlib.axes.AxesSubplot at 0x7fee9093c9d0>

Deaths per Country

In [4]:
death_titles = [k for k in df.columns if 'deaths' in k.lower()]
df.plot(y=death_titles)
plt.legend()
Out[4]:
<matplotlib.legend.Legend at 0x7fee8e8a6650>

Total Deaths

In [5]:
df['total deaths'] = df[death_titles].sum(axis =1)
df.plot(y='total deaths', 
        title='Total Deaths in \n 2014 Ebola Outbreak')
plt.ylabel('Total Deaths')
Out[5]:
<matplotlib.text.Text at 0x7fee8e7e2150>

Exponential Fit

Plot Log(deaths) and exponential fit

In [6]:
import seaborn as sn
df['log total deaths'] = np.log10(df['total deaths'].values)
sn.lmplot('Day','log total deaths', df)
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x7fee8e87b210>

Fit statistics

In [7]:
import statsmodels.formula.api as sm

ols = sm.OLS(df['Day'].values, df['log total deaths'].values)
ols.fit().summary()
Out[7]:
OLS Regression Results
Dep. Variable: y R-squared: 0.867
Model: OLS Adj. R-squared: 0.866
Method: Least Squares F-statistic: 574.1
Date: Mon, 27 Oct 2014 Prob (F-statistic): 2.49e-40
Time: 19:32:15 Log-Likelihood: -466.55
No. Observations: 89 AIC: 935.1
Df Residuals: 88 BIC: 937.6
Df Model: 1
coef std err t P>|t| [95.0% Conf. Int.]
x1 41.7750 1.744 23.960 0.000 38.310 45.240
Omnibus: 34.203 Durbin-Watson: 0.005
Prob(Omnibus): 0.000 Jarque-Bera (JB): 5.737
Skew: 0.034 Prob(JB): 0.0568
Kurtosis: 1.758 Cond. No. 1.00